1 Problema

A motivação original de nosso projeto é o entendimento dos padrões climáticos e suas alterações no clima de uma das cidades mais relevantes num cenário global: Londres. Para isso decidimos fazer um projeto do 1º formato (utilizando um dataset público), onde a base escolhida (london_weather.csv, obtida no website Kaggle através desse link) pode ser interpretada como uma série temporal, a medida que mede as variações do tempo (entre 1979 e 2021) de variáveis fixadas em um único local.

O objetivo do projeto é o uso de um algoritmo de machine learning para que o mesmo consiga prever a categoria de duas das variáveis da database (precipitação e neve) para uma nova informação adicionada ao modelo. As categorias seriam então “chove” ou “não chove” e “neva” ou “não neva”. A ideia será fazer 3 métodos distintos (Regressão Logística, KNN e Árvore de Decisão), compará-los através de suas métricas relacionadas à acurácia e ver qual tem a melhor performance/predição.

2 Dados

A base de dados escolhida foi criada a partir da união de medições oriundas de pedidos de atributos individuais do clima providos pela European Climate Assessment (ECA). As medidas desta base de dados em particular foram gravadas pela estação climática nas redondezas do Aeroporto Heathrow em Londres, Reino Unido. O tamanho original da base de dados escolhida, assim como uma lista dos atributos e suas descrições, está descrito abaixo:

london_weather.csv - 15341 observações x 10 atributos:

  • date - data em que ocorreu a medição - (int)

  • cloud_cover - medição da nebulosidade em oktas - (float)

  • sunshine - medição da luz solar em horas (hrs) - (float)

  • global_radiation - irradiação medida Watt por metro quadrado (W/m2) - (float)

  • max_temp - temperatura máxima registrada em graus Celsius (°C) - (float)

  • mean_temp - temperatura média registrada em graus Celsius (°C) - (float)

  • min_temp - temperatura mínima registrada em graus Celsius (°C) - (float)

  • precipitation - precipitação medida em milímetros (mm) - (float)

  • pressure - pressão medida em Pascals (Pa) - (float)

  • snow_depth - profundidade da neve medida em centímetros (cm) - (float)

2.1 Tratamento dos dados

#Entire Code of ETL data

2.2 Dataframe usado

Mostrando as 10000 primeiras linhas do dataframe já tratado.

3 Modelos

Primeiramente devemos encontrar uma solução baseline para o problema proposto. Usando a intuição e o senso comum imagina-se que, em dias muito frios, haverá uma maior probabilidade de nevar, enquanto que dias mais quentes terão maior incidência de chuva e sem neve. Tomemos então isso como nossa solução baseline que será comparada aos modelos categorizados citados acima (Regressão Logística, Árvore de Decisão e KNN).

3.1 Regressão Logística

3.2 Árvore de Decisão

3.3 KNN ( K-Nearest Neighbors )

4 Comparando modelos

5 Conclusão

Todos códigos referentes ao projeto podem ser encontrados nesse repositório.


LS0tDQp0aXRsZTogJzxwIGNsYXNzPSJoMSIgc3R5bGU9ImZvbnQtd2VpZ2h0OjYwMCI+UHJvamV0byBGaW5hbDwvcD4NCiAgICAgIDxwIGNsYXNzPSJoMiI+TUFDMDQ2MCAtIEludHJvZHXDp8OjbyBhbyBhcHJlbmRpemFkbyBkZSBtw6FxdWluYTwvcD4NCiAgICAgIDxwIGNsYXNzPSJoNCI+UHJvZsKqIE5pbmEgSGlyYXRhPC9wPg0KICAgICAgPGJyPg0KICAgICAgPGRpdiBzdHlsZT0iZm9udC1zaXplOnNtYWxsOyBmb250LXdlaWdodDogMjAwO3RleHQtYWxpZ246cmlnaHQiPg0KICAgICAgQW5kcsOpIEtlbmppIEZsb3JlbnRpbm8gWWFtYW1vdG8gLSAxMTgwOTYyMSA8YnI+DQogICAgICBCcnVubyBHcm9wZXIgTW9yYmluIC0gMTE4MDk4NzUgPGJyPg0KICAgICAgTHVpZ2kgUGF2YXJpbmkgZGUgTGltYSAtIDExODQ0NjQyIDxicj4NCiAgICAgIFZpdG9yIEdhcmNpYSBDb21pc3NvbGkgLSAxMTgxMDQxMSA8YnI+DQogICAgICA8L2Rpdj4nDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQ0KICAgIGNzczogInNldHRpbmdzL3N0eWxlLmNzcyINCiAgICB0b2M6IHRydWUNCiAgICB0b2NfZGVwdGg6IDMNCiAgICB0b2NfZmxvYXQ6IA0KICAgICAgY29sbGFwc2VkOiBmYWxzZQ0KICAgICAgc21vb3RoX3Njcm9sbDogZmFsc2UNCiAgICBkZl9wcmludDogcGFnZWQNCi0tLQ0KPGhyPg0KYGBge3IgaW5jbHVkZT1GQUxTRX0NCnNvdXJjZSgnc2V0dGluZ3Mvc2V0dXAuUicpDQpgYGANCg0KYGBge3IgZWNobz1GQUxTRSwgbWVzc2FnZT1GQUxTRSwgcmVzdWx0cz0naGlkZScsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KHJldGljdWxhdGUpDQpgYGANCg0KYGBge3IgZWNobz1GLCBtZXNzYWdlPUYsd2FybmluZz1GLHJlc3VsdHM9J2hpZGUnfQ0Kc291cmNlKCdzZXR0aW5ncy9wbG90c19zdHlsZS5SJykNCnNvdXJjZV9weXRob24oJ3NldHRpbmdzL3Bsb3RzX3N0eWxlLnB5JykNCmBgYA0KDQo8IS0tICUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlIC0tPg0KIyBQcm9ibGVtYQ0KDQpBIG1vdGl2YcOnw6NvIG9yaWdpbmFsIGRlIG5vc3NvIHByb2pldG8gw6kgbyBlbnRlbmRpbWVudG8gZG9zIHBhZHLDtWVzIGNsaW3DoXRpY29zIGUgc3VhcyBhbHRlcmHDp8O1ZXMgbm8gY2xpbWEgZGUgdW1hIGRhcyBjaWRhZGVzIG1haXMgcmVsZXZhbnRlcyBudW0gY2Vuw6FyaW8gZ2xvYmFsOiBMb25kcmVzLiBQYXJhIGlzc28gZGVjaWRpbW9zIGZhemVyIHVtIHByb2pldG8gZG8gMcK6IGZvcm1hdG8gKHV0aWxpemFuZG8gdW0gZGF0YXNldCBww7pibGljbyksIG9uZGUgYSBiYXNlIGVzY29saGlkYSAoYGxvbmRvbl93ZWF0aGVyLmNzdmAsIG9idGlkYSBubyB3ZWJzaXRlIF9LYWdnbGVfIGF0cmF2w6lzIFtkZXNzZSBsaW5rXShodHRwczovL3d3dy5rYWdnbGUuY29tL2RhdGFzZXRzL2VtbWFudWVsZndlcnIvbG9uZG9uLXdlYXRoZXItZGF0YSkpIHBvZGUgc2VyIGludGVycHJldGFkYSBjb21vIHVtYSBzw6lyaWUgdGVtcG9yYWwsIGEgbWVkaWRhIHF1ZSBtZWRlIGFzIHZhcmlhw6fDtWVzIGRvIHRlbXBvIChlbnRyZSAxOTc5IGUgMjAyMSkgZGUgdmFyacOhdmVpcyBmaXhhZGFzIGVtIHVtIMO6bmljbyBsb2NhbC4NCg0KTyBvYmpldGl2byBkbyBwcm9qZXRvIMOpIG8gdXNvIGRlIHVtIGFsZ29yaXRtbyBkZSBtYWNoaW5lIGxlYXJuaW5nIHBhcmEgcXVlIG8gbWVzbW8gY29uc2lnYSBwcmV2ZXIgYSBjYXRlZ29yaWEgZGUgZHVhcyBkYXMgdmFyacOhdmVpcyBkYSBkYXRhYmFzZSAocHJlY2lwaXRhw6fDo28gZSBuZXZlKSBwYXJhIHVtYSBub3ZhIGluZm9ybWHDp8OjbyBhZGljaW9uYWRhIGFvIG1vZGVsby4gQXMgY2F0ZWdvcmlhcyBzZXJpYW0gZW50w6NvIOKAnGNob3Zl4oCdIG91IOKAnG7Do28gY2hvdmXigJ0gZSDigJxuZXZh4oCdIG91IOKAnG7Do28gbmV2YeKAnS4NCkEgaWRlaWEgc2Vyw6EgZmF6ZXIgMyBtw6l0b2RvcyBkaXN0aW50b3MgKFJlZ3Jlc3PDo28gTG9nw61zdGljYSwgS05OIGUgw4Fydm9yZSBkZSBEZWNpc8OjbyksIGNvbXBhcsOhLWxvcyBhdHJhdsOpcyBkZSBzdWFzIG3DqXRyaWNhcyByZWxhY2lvbmFkYXMgw6AgYWN1csOhY2lhIGUgdmVyIHF1YWwgdGVtIGEgbWVsaG9yIHBlcmZvcm1hbmNlL3ByZWRpw6fDo28uDQoNCjwhLS0gJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUgLS0+DQojIERhZG9zDQoNCkEgYmFzZSBkZSBkYWRvcyBlc2NvbGhpZGEgZm9pIGNyaWFkYSBhIHBhcnRpciBkYSB1bmnDo28gZGUgbWVkacOnw7VlcyBvcml1bmRhcyBkZSBwZWRpZG9zIGRlIGF0cmlidXRvcyBpbmRpdmlkdWFpcyBkbyBjbGltYSBwcm92aWRvcyBwZWxhIF9FdXJvcGVhbiBDbGltYXRlIEFzc2Vzc21lbnRfIChFQ0EpLiBBcyBtZWRpZGFzIGRlc3RhIGJhc2UgZGUgZGFkb3MgZW0gcGFydGljdWxhciBmb3JhbSBncmF2YWRhcyBwZWxhIGVzdGHDp8OjbyBjbGltw6F0aWNhIG5hcyByZWRvbmRlemFzIGRvIEFlcm9wb3J0byBIZWF0aHJvdyBlbSBMb25kcmVzLCBSZWlubyBVbmlkby4gTyB0YW1hbmhvICoqb3JpZ2luYWwqKiBkYSBiYXNlIGRlIGRhZG9zIGVzY29saGlkYSwgYXNzaW0gY29tbyB1bWEgbGlzdGEgZG9zIGF0cmlidXRvcyBlIHN1YXMgZGVzY3Jpw6fDtWVzLCBlc3TDoSBkZXNjcml0byBhYmFpeG86DQoNCmBsb25kb25fd2VhdGhlci5jc3ZgIC0gMTUzNDEgb2JzZXJ2YcOnw7VlcyB4IDEwIGF0cmlidXRvczoNCg0KLSBgZGF0ZWAgLSBkYXRhIGVtIHF1ZSBvY29ycmV1IGEgbWVkacOnw6NvIC0gKCoqaW50KiopDQoNCi0gYGNsb3VkX2NvdmVyYCAtIG1lZGnDp8OjbyBkYSBuZWJ1bG9zaWRhZGUgZW0gb2t0YXMgLSAoKipmbG9hdCoqKQ0KDQotIGBzdW5zaGluZWAgLSBtZWRpw6fDo28gZGEgbHV6IHNvbGFyIGVtIGhvcmFzIChocnMpIC0gKCoqZmxvYXQqKikNCg0KLSBgZ2xvYmFsX3JhZGlhdGlvbmAgLSBpcnJhZGlhw6fDo28gbWVkaWRhIFdhdHQgcG9yIG1ldHJvIHF1YWRyYWRvIChXL20yKSAtICgqKmZsb2F0KiopDQoNCi0gYG1heF90ZW1wYCAtIHRlbXBlcmF0dXJhIG3DoXhpbWEgcmVnaXN0cmFkYSBlbSBncmF1cyBDZWxzaXVzICjCsEMpIC0gKCoqZmxvYXQqKikNCg0KLSBgbWVhbl90ZW1wYCAtIHRlbXBlcmF0dXJhIG3DqWRpYSByZWdpc3RyYWRhIGVtIGdyYXVzIENlbHNpdXMgKMKwQykgLSAoKipmbG9hdCoqKQ0KDQotIGBtaW5fdGVtcGAgLSB0ZW1wZXJhdHVyYSBtw61uaW1hIHJlZ2lzdHJhZGEgZW0gZ3JhdXMgQ2Vsc2l1cyAowrBDKSAtICgqKmZsb2F0KiopDQoNCi0gYHByZWNpcGl0YXRpb25gIC0gcHJlY2lwaXRhw6fDo28gbWVkaWRhIGVtIG1pbMOtbWV0cm9zIChtbSkgLSAoKipmbG9hdCoqKQ0KDQotIGBwcmVzc3VyZWAgLSBwcmVzc8OjbyBtZWRpZGEgZW0gUGFzY2FscyAoUGEpIC0gKCoqZmxvYXQqKikNCg0KLSBgc25vd19kZXB0aGAgLSBwcm9mdW5kaWRhZGUgZGEgbmV2ZSBtZWRpZGEgZW0gY2VudMOtbWV0cm9zIChjbSkgLSAoKipmbG9hdCoqKQ0KDQojIyBUcmF0YW1lbnRvIGRvcyBkYWRvcw0KDQpgYGB7cHl0aG9uIGNvZGU9cmVhZExpbmVzKCJzY3JpcHRzL3NjcmlwdF9lbHQucHkiKSwgaW5jbHVkZT1UfQ0KI0VudGlyZSBDb2RlIG9mIEVUTCBkYXRhDQpgYGANCg0KIyMgRGF0YWZyYW1lIHVzYWRvDQoNCk1vc3RyYW5kbyBhcyAxMDAwMCBwcmltZWlyYXMgbGluaGFzIGRvIGRhdGFmcmFtZSBqw6EgdHJhdGFkby4NCmBgYHtyIGVjaG89Rn0NCiMgRFQ6OmRhdGF0YWJsZShweSRkYXRhKQ0KcHkkZGF0YQ0KYGBgDQo8IS0tICUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlIC0tPg0KIyBNb2RlbG9zDQoNClByaW1laXJhbWVudGUgZGV2ZW1vcyBlbmNvbnRyYXIgdW1hIHNvbHXDp8OjbyBiYXNlbGluZSBwYXJhIG8gcHJvYmxlbWEgcHJvcG9zdG8uIFVzYW5kbyBhIGludHVpw6fDo28gZSBvIHNlbnNvIGNvbXVtIGltYWdpbmEtc2UgcXVlLCBlbSBkaWFzIG11aXRvIGZyaW9zLCBoYXZlcsOhIHVtYSBtYWlvciBwcm9iYWJpbGlkYWRlIGRlIG5ldmFyLCBlbnF1YW50byBxdWUgZGlhcyBtYWlzIHF1ZW50ZXMgdGVyw6NvIG1haW9yIGluY2lkw6puY2lhIGRlIGNodXZhIGUgc2VtIG5ldmUuIFRvbWVtb3MgZW50w6NvIGlzc28gY29tbyBub3NzYSBzb2x1w6fDo28gYmFzZWxpbmUgcXVlIHNlcsOhIGNvbXBhcmFkYSBhb3MgbW9kZWxvcyBjYXRlZ29yaXphZG9zIGNpdGFkb3MgYWNpbWEgKFJlZ3Jlc3PDo28gTG9nw61zdGljYSwgw4Fydm9yZSBkZSBEZWNpc8OjbyBlIEtOTikuDQoNCiMjIFJlZ3Jlc3PDo28gTG9nw61zdGljYSANCjwhLS0gUmVncmVzc8OjbyBMb2fDrXN0aWNhIC0tPg0KDQoNCg0KPCEtLSAlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSAtLT4NCiMjIMOBcnZvcmUgZGUgRGVjaXPDo28NCjwhLS0gw4Fydm9yZSBkZSBEZWNpc8OjbyAtLT4NCg0KDQoNCjwhLS0gJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUgLS0+DQojIyBLTk4gKCBfSy1OZWFyZXN0IE5laWdoYm9yc18gKQ0KPCEtLSBrTk4gLS0+DQoNCg0KDQo8IS0tICUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlIC0tPg0KIyBDb21wYXJhbmRvIG1vZGVsb3MNCg0KDQoNCjwhLS0gJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUgLS0+DQojIENvbmNsdXPDo28NCg0KDQoNCg0KVG9kb3MgY8OzZGlnb3MgcmVmZXJlbnRlcyBhbyBwcm9qZXRvIHBvZGVtIHNlciBlbmNvbnRyYWRvcyBbbmVzc2UgcmVwb3NpdMOzcmlvXShodHRwczovL2dpdGh1Yi5jb20vYm1vcmJpbi9NTF9Qcm9qZWN0KS4NCg0KPGhyPg0KDQo8ZGl2IHN0eWxlPSJkaXNwbGF5OiBmbGV4O2p1c3RpZnktY29udGVudDogZmxleC1lbmQ7bWFyZ2luLXRvcDoxMHB4Ij4NCjxhIGlkPSJyZXBvX2ljb24iIGhyZWYgPSJodHRwczovL2dpdGh1Yi5jb20vYm1vcmJpbi9NTF9Qcm9qZWN0IiB0YXJnZXQ9Il9ibGFuayI+DQogIDxzdmcgaGVpZ2h0PSIzMiIgYXJpYS1oaWRkZW49InRydWUiIHZpZXdCb3g9IjAgMCAxNiAxNiIgd2lkdGg9IjMyIiBmaWxsPSIjZDZkNmQ2Ij4NCiAgICA8cGF0aCBkPSJNOCAwQzMuNTggMCAwIDMuNTggMCA4YzAgMy41NCAyLjI5IDYuNTMgNS40NyA3LjU5LjQuMDcuNTUtLjE3LjU1LS4zOCAwLS4xOS0uMDEtLjgyLS4wMS0xLjQ5LTIuMDEuMzctMi41My0uNDktMi42OS0uOTQtLjA5LS4yMy0uNDgtLjk0LS44Mi0xLjEzLS4yOC0uMTUtLjY4LS41Mi0uMDEtLjUzLjYzLS4wMSAxLjA4LjU4IDEuMjMuODIuNzIgMS4yMSAxLjg3Ljg3IDIuMzMuNjYuMDctLjUyLjI4LS44Ny41MS0xLjA3LTEuNzgtLjItMy42NC0uODktMy42NC0zLjk1IDAtLjg3LjMxLTEuNTkuODItMi4xNS0uMDgtLjItLjM2LTEuMDIuMDgtMi4xMiAwIDAgLjY3LS4yMSAyLjIuODIuNjQtLjE4IDEuMzItLjI3IDItLjI3LjY4IDAgMS4zNi4wOSAyIC4yNyAxLjUzLTEuMDQgMi4yLS44MiAyLjItLjgyLjQ0IDEuMS4xNiAxLjkyLjA4IDIuMTIuNTEuNTYuODIgMS4yNy44MiAyLjE1IDAgMy4wNy0xLjg3IDMuNzUtMy42NSAzLjk1LjI5LjI1LjU0LjczLjU0IDEuNDggMCAxLjA3LS4wMSAxLjkzLS4wMSAyLjIgMCAuMjEuMTUuNDYuNTUuMzhBOC4wMTMgOC4wMTMgMCAwMDE2IDhjMC00LjQyLTMuNTgtOC04LTh6Ij48L3BhdGg+DQo8L3N2Zz4NCjwvYT4NCjwvZGl2Pg0KDQo8IS0tID09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09IC0tPg0KPCEtLSBQYXJ0ZSBkYSBmb3JtYXRhw6fDo28gLS0+DQo8c2NyaXB0IHNyYz0ic2V0dGluZ3MvY29kZS5qcyI+PC9zY3JpcHQ+DQo=